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Abstract 

The weak limit of the normalized number of comparisons needed 
by the Quicksort algorithm to sort n randomly permuted items 
is known to be determined implicitly by a distributional fixed- 
point equation. We give an algorithm for perfect random variate 
generation from this distribution. 
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1 Introduction 



Let C n denote the number of key comparisons needed to sort a list of n 
randomly permuted items by Quicksort. It is known that 

JEC n = 2(n + l)H n - An ~ 2nlnn and Var C n ~ (7 - (2n 2 /3))n 2 , 

where H n denotes the nth harmonic number. Furthermore, 

v _ C n — IEC n 

A n :— ► A 

n 

in distribution. This limit theorem was first obtained by Regnier || by 
an application of the martingale convergence theorem. Rosier |§ gave a 
different proof of this limit law via the contraction method. Rosler's approach 
identifies the distribution of X to be the unique solution with zero mean and 
finite variance of the distributional fixed-point equation 

X = UXW + (1 - U)X^ + g(U), (1) 

where X«, X^ 2 ), and U are independent; X^ and X^ are distributed as 
X] U is uniform [0, 1]; g is given by g(u) := 1 + 2ulnu + 2(1 — u) ln(l — u); 
and = denotes equality in distribution. 

The limit random variable X has finite moments of every order which 
are computable from the fixed point equation ([[]). Tan and Hadjicostas flQ| 



proved that X has a Lebesgue density. Not much else was known rigorously 
about this distribution until Fill and Janson recently derived some properties 
of the limiting density H] and results about the rate of convergence of the 
law of X n to that of X 0. Some of these results are restated for the reader's 
convenience in the next section. 

We develop an algorithm, based on the results of Fill and Janson, which 
returns a perfect sample of the limit random variable X. We assume that we 
have available an infinite sequence of i.i.d. uniform [0, 1] random variables. 
Our solution is based on a modified rejection method, where we use a conver- 
gent sequence of approximations for the density to decide the outcome of a 
rejection test. Such an approach was recently used by Devroye to sample 
perfectly from perpetuities. 
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2 Properties of the quicksort density 



Our rejection sampling algorithm is based on a simple upper bound and an 
approximation of (the unique continuous version of) the Quicksort limit 
density /. We use the following properties of / established in || and ||. 
Let F n denote the distribution function for X n . 

PI. / is bounded §: 

sup f(x) < K :— 16, 

P2. / is infinitely differentiate and || 

sup \f (x)\ <K := 2466, 

P3. With 5 n := (2c/K) 1 /2 n -V6 ) w h er e c := (McK 2 ) 1 / 3 , c := 589, we have @ 



sup 

x£TR 



F n (x+{S n /2))-F n (x-{S n /2)) 



where R n := (432cK 2 ir 3 ) ] 



i/6 n -i/6_ 



By property P2, / is Lipschitz continuous with Lipschitz constant K. 
Therefore, Theorem 3.5 in Devroye [Q, p. 320] implies the upper bound 

f{x) < \/2Kmm(F(x),l - F(x)). 

Here, F denotes the distribution function corresponding to /. Markov's 
inequality yields F(x) = JP(X < x) < (EX 4 )/i 4 for all x < 0. Similarly, 1 — 
F(x) = JP(X > x) < (EX 4 )/x 4 for x > 0. The fourth moment of X can be 
derived explicitly in terms of the zeta function either by Hennequin's formula 
for the cumulants of X (this formula was conjectured in Hennequin |7j and 
proved later in his thesis) or through the fixed point equation ([!]). From ([!]), 
Cramer [0] computed EX 4 = 0.7379 . . . (accurate to the indicated precision), 
so 1EX 4 < 1. Therefore, if we define 

g(x) :=min(K, (2K) 1/2 x~ 2 ) , x 6 M, (2) 

we have / < g. The scaled version g := £g is the density of a probability 
measure for £ := l/||g||£i = [4i^ 1 / 2 (2i^) 1 / 4 ] -1 . A perfect sample from the 
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density g is given by [(2K) 1 ^/K 1 / 2 ]SU 1 /U 2 , with S, U u and U 2 independent; 
Ui and U% uniform [0, 1]; and S an equiprobable random sign (cf. Theorem 3.3 
in Devroye @, p. 315]). 

Remark. According to the results of 0, / enjoys superpolynomial decay 
at ±oo, so certainly / < g for some g of the form g(x) := mm(K,Cx~ 2 ). 
One way to obtain an explicit constant C is to use 

I r°° 

x 2 f(x)<— \(/>"(t)\dt, xeM, 
2tt J-oo 

where is the characteristic function corresponding to /, and to bound 
!</>"(£) | [e.g., by min(ci, c 2 t -2 ) for suitable constants Ci,c 2 ] as explained in 
the proof of Theorem 2.9 in 0. But we find that our approach is just as 
straightforward, and gives a smaller value of C (although we have made no 
attempt to find the best C possible using the Fourier techniques of 0). 



3 The rejection algorithm 

We have found an explicit, integrable upper bound on /. Furthermore, an 
approximation of / with explicit error estimate is given by P3. Let 

, , s . _ F n (x + (5 n /2)) - F n (x - (5 n /2)) 

Jn{X) ■— x 

On 

with 5 n given in P3. Then \ f n {x) — f(x)\ < R n for all x G H, and R n — > 
for n — > oo. 

To calculate the values of f n we require knowledge about the probabilities 
of the events {C n = i}. Let N(n,i) denote the number of permutations of 
n distinct numbers for which Quicksort needs exactly i key comparisons to 
sort. Then 

N(n,i) 



P(C n 



nl 



These probabilities are non-zero only if n — 1 < i < n(n — l)/2. With the 
initializing conventions N(0, 0) := 1 and N(i,0) := for i > 1 and the 
obvious values N(n, i) = for i < n — 1 and for i > n{n — l)/2, we have the 
following recursion for n > 1 and for n — 1 < i < n(n — l)/2: 

n t— (n— 1) 

N(n,i) = Y, J2 N{k-l,l)N(n-k,i-{n-l)-l). 

k=l 1=0 



4 



This recurrence is well known. To verify it, assume that the first pivot 
element is the kth largest element out of n. Then the number of permutations 
leading to i key comparisons is the number N(k — 1,1) of permutations of the 
items less than the pivot element which are sorted with I key comparisons, 
multiplied by the corresponding number of permutations for the elements 
greater than the pivot element, summed over all possible values of k and I. 
Note that n— 1 key comparisons are used for the splitting procedure. Observe 
that we also have IEC n = X^ziV^n, i)/n!. The table (N(n,i) : i < n(n— 1)/2), 
and IEC n , can be computed from the previous tables (N(k,i) : i < k{k — 
l)/2), < k < n, in time 0(n 5 ). Then, observe that, for y < z, 

F n (z) - F n (y) = i £ N(n,i), 

n i — 

EC„+nf/<i<IEC„+nz 

and thus is computable from the table (N(n,i) : i < ra(n — l)/2) and 

EC n in time 0(n(z — y)) = 0(n5 n ) = 0(ra 5 / 6 ). Now, the following rejection 
algorithm gives a perfect sample X from the Quicksort limit distribution F: 

repeat 

generate U, U\, U2 uniform [0,1] 
generate S uniform on { — 1,+1} 

T «- Ug(X) (where ^(x) := min(K, {2Kfl 2 / x 2 )) 

repeat 

n <— n + 1 

compute the full table of N(n,i) for all i < n(n — l)/2 

until |T - F| > i? n 

Accept = [T < Y - RJ 
until Accept 
return X 

This algorithm halts with probability one, and produces a perfect sample 
from the Quicksort limit distribution. The expected number of outer loops 
is \\g^p = AK 1 / 2 (2K) 1 / 4 = 134.1. Note, however, that the constants K 
and K are very crude upper bounds for ||/||oo and ||/'||oo, which from the 
results of numerical calculations reported in JIIJ appear to be on the order 
of 1 and 2, respectively. 
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Moreover, considerable speed-up could be achieved for our algorithm by 
finding another approximation f n to / that either is faster to compute or 
is faster to converge to / (or both). One promising approach, on which we 
hope to report more fully in future work, is to let fi,f 2 ,... be the densities 
one obtains, starting from a suitably nice density fo (say, standard normal), 
by applying the method of successive substitutions to ([!]). Indeed, Fill and 
Janson || show that then f n —*f uniformly at an exponential rate. How- 
ever, one difficulty is that these computations require repeated numerical 
integration, but it should be possible to bound the errors in the numerical 
integrations using calculations similar to those in 

Remark. Let k = k n := |log 2 (n-|-l)J . We noted above that if N(n, i) > 0, 
then n — 1 < i < n(n — l)/2. This observation can be refined. In fact, 
using arguments as in ||, it can be shown that N(n,i) > if and only if 

TTT-n < i < M n , with 

m n := k(n + 1) - 2 k+1 + 2 ~ n\og 2 n = (l/ln2) nlnn = (1.44. . .) n\nn 
= the total path length for the complete tree on n nodes 

and M n := n(n — l)/2. These extreme values satisfy the initial conditions 
mo = = Mq and, for n > 1, the simple recurrences 

m n = m n _i + [log 2 n\ and M n = M n _ x + (n - 1). 
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